Computerization of African languages-French dictionaries

نویسندگان

  • Chantal Enguehard
  • Mathieu Mangeot
چکیده

This paper relates work done during the DiLAF project. It consists in converting 5 bilingual African language-French dictionaries originally in Word format into XML following the LMF model. The languages processed are Bambara, Hausa, Kanuri, Tamajaq and Songhai-zarma, still considered as under-resourced languages concerning Natural Language Processing tools. Once converted, the dictionaries are available online on the Jibiki platform for lookup and modification. The DiLAF project is first presented. A description of each dictionary follows. Then, the conversion methodology from .doc format to XML files is presented. A specific point on the usage of Unicode follows. Then, each step of the conversion into XML and LMF is detailed. The last part presents the Jibiki lexical resources management platform used for the project.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Vers l'informatisation de quelques langues d'Afrique de l'Ouest (Towards the computerization of some west-african languages) [in French]

Chantal Enguehard1 Soumana Kané2 Mathieu Mangeot3 Issouf Modi4 Mamadou Lamine Sanogo5 (1) LINA2, rue de la Houssinière, BP 92208, 44322 Nantes Cedex 03, France (2) CNR-ENF, BP 62, Bamako, Mali (3) LIG,BP 53 38041 Grenoble, France (4) MEN/A/PLN/DGPLN/DREL, BP 557, Niamey, Niger (5) CNRST, BP 7047 Ouagadougou 03, Burkina Faso [email protected], [email protected], Mathieu.Mangeot@i...

متن کامل

Automatic Diacritic Restoration for Resource-Scarce Languages

The orthography of many resource-scarce languages includes diacritically marked characters. Falling outside the scope of the standard Latin encoding, these characters are often represented in digital language resources as their unmarked equivalents. This renders corpus compilation more difficult, as these languages typically do not have the benefit of large electronic dictionaries to perform di...

متن کامل

Conversion of Lexicon - Grammar tables to LMF : application to French 1

In this chapter, we describe the first experiment of conversion of Lexicon-Grammar tables for French verbs into the LMF format. The Lexicon-Grammar of the French language is currently one of the major sources of lexical and syntactic information for French. Its conversion into an interoperable representation format according to the LMF standard makes it usable in different contexts, thus contri...

متن کامل

On multiword lexical units and their role in maritime dictionaries

Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...

متن کامل

Combining Corpus and Machine - ReadableDictionary Data for Building Bilingual

This paper describes and discusses some theoretical and practical problems arising from developing a system to combine the structured but incomplete information from machine readable dictionaries (MRDs) with the unstructured but more complete information available in corpora for the creation of a bilingual lexical data base, presenting a methodology to integrate information from both sources in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1405.5893  شماره 

صفحات  -

تاریخ انتشار 2014